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Consider a finite renewal process in the sense that interrenewal times are positive i.i.d. variables 
and the total number of renewals is a random variable, independent of interrenewal times. A 
finite point process can be obtained by probabilistic sampling of the finite renewal process, 
where each renewal is sampled with a fixed probability and independently of other renewals. 
The problem addressed in this work concerns statistical inference of the original distributions 
of the total number of renewals and interrenewal times from a sample of i.i.d. finite point 
processes obtained by sampling finite renewal processes. This problem is motivated by traffic 
measurements in the Internet in order to characterize flows of packets (which can be seen as 
finite renewal processes) and where the use of packet sampling is becoming prevalent due to 
increasing link speeds and limited storage and processing capacities. 

Keywords: IP flows; finite renewal process; interrenewal times; number of renewals; sampling; 
thinning; asymptotic normality; decompounding 

1. Introduction 
1.1. Motivation 

The statistical and probabilistic problems considered in this work are motivated by ques- 
tions arising from data traffic analysis in modern communication networks such as the 
Internet. Over these networks, information is sent in the form of packets, and packets 
are grouped into flows. (A flow corresponds to a group of packets sharing common char- 
acteristics. Ideally, a flow can be thought of as a set of packets that arises in the network 
through a remote terminal session or a web page download.) Each packet carries infor- 
mation about the flow it belongs to and also whether it is the first or the last packet 
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in the flow. Examining each packet then allows all flows to be reconstructed. Knowing 
the structure of flows (flow level characteristics) helps network operators and networking 
researchers to understand and discover characteristics of data traffic. 

A key difficulty with capturing each packet is that this rapidly leads to a huge amount 
of data to store and analyze. For instance, capturing all traffic for a few hours on a 
gigabit /sec. link at a medium load level yields several hundred gigabytes of data. A way 
to reduce the volume of data is by sampling packets. One of the simplest approaches is 
probabilistic sampling, where each packet, independently of the others, is captured and 
analyzed with a fixed probability q. The basic problem, known as the inversion problem 
in the network traffic literature, is then to deduce the structure of the original flows from 
sampled packets. 

This problem has recently attracted much attention in the networking community, 
with the focus almost exclusively on inference of the original distribution of flow sizes 
(total number of packets); see [11, 17, 24]. In this work, we take a closer look at statistical 
properties of the previously considered estimator of the original distribution of flow sizes 
and are also interested in inference of interarrival times between packets from sampled 
data. This distribution allows traffic burstiness to be characterized and leads to other 
major characteristics such as the duration of a flow. 



1.2. Statement of problem and goals 

From a mathematical standpoint, flows and their probabilistic sampling can be described 
as follows. As suggested by data traffic measurements (e.g., [18]), a flow can be modeled 
as a finite renewal process, where the total number of renewals W is random and the 
interrenewal times Di, i = 1, . . . , W — 1, arc positive i.i.d. random variables, independent 
of W. Suppose that the finite renewal process is sampled probabilistically with a fixed 
probability q to obtain a sampled finite point process. (As shown below, the resulting 
sampled point process is generally not a finite renewal process.) The sampled number 
of renewals (which could be zero) and sampled interrenewal times (which are available 
only when the number of sampled renewals is greater than 1) will be denoted by W q 
and D Qt i, i = 1, . . . , W q — 1, respectively. We illustrate this notation in Figure 1, where 
all circles (both empty and full) correspond to the finite renewal process, and full circles 
correspond to the sampled finite point process. 

Given N i.i.d. copies of a sampled finite point process (some of which are empty when 
the number of sampled renewals is zero), one of our main goals is to infer the original 
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Figure 1. Finite renewal process and sampled finite point process. 
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distribution Fd of interrenewal times D,. The focus is on nonparametric inference of Fd, 
statistical properties of the resulting estimator and its performance in simulations. The 
estimator of Fd involves a nonparametric estimator of the original distribution fw of 
the total number of renewals W, which has previously been considered in the network 
traffic context [17]. Statistical properties of the estimator of fw will also be studied here 
for the first time. 

1.3. Inference procedure 

To be able to make inference about Fd, we first need to relate it to Fd .u, where the 
latter indicates the conditional distribution function of D q ^ given W q = s (with i < s). 
As part of this work (see Theorem 3.1 in Section 3), we show that 

oo 

F D ^ ls =J2 A ^nF D m , (1.1) 
m— 1 

where *m denotes the mth convolution. In particular, the right-hand side of (1.1), and 
hence F Dq ,| s , does not depend i. We also note for later reference that the definition of 
the sequence A s = {A SiTra } m6 N involves the distribution fw- Somewhat independently 
of the main objectives above, we examine (1.1) for several underlying distributions fw, 
such as geometric and heavy-tailed, that arise in network traffic studies (e.g., [6]). For 
example, in the geometric case, the distribution F Dq . | s does not depend on s, although 
this seems to be an exception to the general rule. We also provide a result similar to (1.1) 
for a multidimensional vector (-D g .i, . . . ,D qn ). This allows (conditional) dependencies to 
be examined among sampled interrenewal times. For example, when the distribution fw 
is geometric, the sampled interrenewal times turn out to be independent, in which case 
the sampled finite point process is a finite renewal process. We should also note that 
forms of conditioning other than on W q = s arc possible, such as W q > s; these will be 
discussed briefly. 

Having the relation (1.1), we would naturally expect that it can be inverted, in the 
sense that 

oo 

Fd=J2 a ^F D Z\ s , (1-2) 

m— 1 

where a s = {a s . m } m gN is obtained by reversion of A s = {A SiTn } m ^, that is, their respec- 
tive z-transformations (or formal power series) G ag (z) = J2m=i a s-m,z m and Ga b ( z ) = 
Em=i^,™ zm satisfy G as (G Ab (z)) = G As (G as (z)) = z. The relation (1.2) suggests a 
natural estimator of Fd, defined as (see Section 4.2) 

oo 
m— 1 

where F Dq 4 | g is the empirical distribution function of D q _i given W q = s, and a s is the 
reversion of the sequence A s , where the latter is defined as A s by replacing fw in its 
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definition by the empirical distribution fw- Note that the estimator Fp is defined for 
fixed s and i. 

1.4. Contributions to the literature 

Under technical assumptions, we will show that Fd is an asymptotically normal estimator 
of Fjj , namely, 

VN(F D -F D )AX, (1.4) 

where X is a limiting Gaussian process and the convergence in distribution -4 holds in 
a suitable space of functions (Theorem 4.3). The approach and some techniques behind 
this result are closely related to the works in the so-called decompounding framework 
by Buchmann and Griibcl [5], B0gstcd and Pitts [3] (see also [14, 15]). These authors 
consider analogous estimators, but where the sequence A s in (1.1) is known and hence so 
is its reversion sequence a s . We thus deviate from these earlier works by assuming that a s 
also needs to be estimated, which makes the analysis substantially more complex. More 
specifically, the limit X in (1.4) can be written as 

OO / OO \ 

X = - * (C ° a.))nF%, t \. + E a ^nF*£-? * Z, (1.5) 

n=l \n=l ) 

where (C, Z) is a Gaussian process, o is a composition operation, ai 1 ^ is the "derivative" 
of a s and * is the convolution. The second term in (1.5) is the term found in the decom- 
pounding literature when A s and a s are supposed to be known. The first term in (1.5) 
is new and accounts for the variations in a s here. 

Since a s involves the estimator fw via A s , we also need the asymptotic normality result 
for fw- Although fw already appears in the network traffic literature [17], its statistical 
properties have not been studied before to the best of our knowledge, and the asymptotic 
normality result is also derived here. We show that, under suitable assumptions, 

y/N(f w -f w )^S(0, (1.6) 

where S(£) = {S(£) W } W £N is a Gaussian process (Theorem 4.1 and Proposition 4.5). It is 
interesting to note that the result (1.6) is shown under technical assumptions which do 
not cover distributions with heavier tails (such as heavy-tailed distributions). It is proved, 
however, that the assumptions can be thought of as sharp (in the sense of Proposition 
4.4 of Section 4.1). What the asymptotics of fw are beyond these assumptions remains 
an interesting open question. 

We would like to make several other related comments. First, it is well known from es- 
timation of flow sizes in the network traffic context that the performance of the estimator 
fw degrades rapidly as the sampling probability q decreases. For example, for q = 0.1, 
the inference becomes impractical for reasonable sample sizes N. We shall discuss this 
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fact in light of the derived asymptotic normality result for fw (Sections 4.1 and 5.1) by 
indicating two different regimes for the estimator performance in terms of the limiting 
asymptotic variance. The regimes are defined as: 

• stable, if sup E S < oo; 

• explosive, if sup ES(£) W — oo, 

where S(£) is the limiting Gaussian process in (1.6) and ES^)^ is expressed in terms of q, 
and fw or fw g . The performance of the estimator fw is satisfactory in the stable regime, 
but poor in the explosive regime. For small q, fw typically belongs to the explosive regime 
and hence to the case of poor performance. Analogous difficulties for smaller q remain 
when using the estimator Fjj. Because of these practical considerations, in the network 
traffic context, alternative sampling schemes have been sought and considered, such as 
the sample-and-hold method [7, 12], and the dual sampling technique [22]. We plan to 
study inference of interrenewal times in one of these frameworks and postpone any real 
data application to future work. This work will therefore be limited to a simulation study 
(Section 5). 

In another direction, the results on characterizing the sampled number of renewals and 
sampled interrenewal times, such as the relation (1.1), contribute to a substantial liter- 
ature on sampling of point processes. Sampling (also known as thinning) is discussed in 
several manuscripts such as [8, 9, 19]. To the best of our knowledge, sampling (thinning) 
of finite renewal processes has been largely unexplored. The results of our work (Sec- 
tion 3) show that sampling of finite renewal processes does not generally lead to finite 
renewal processes (in that, conditionally on the number of sampled renewals, sampled 
interrenewal times are generally dependent). 

1.5. Structure of the paper 

The rest of the paper is organized as follows. In Section 2 we collect the notation used 
throughout the work and include other preliminaries. Section 3 concerns properties of the 
finite point process obtained from sampling the finite renewal process. Inference of the 
original distributions of the total number of renewals and interrenewal times is studied in 
Section 4. A simulation study can be found in Section 5. For better readability, most of 
the proofs are deferred to Appendix A. The main body of the article contains the proofs 
of only those results which we consider key in individual sections. Finally, Appendix B 
contains a technical result used in Section 4.2, with bounds on remainder terms in Taylor 
expansions of compositions of formal power series. 

2. Notation and other preliminaries 

Here, we introduce notation and make a number of assumptions that will be used through- 
out the paper. As in Section 1, we consider a finite renewal process consisting of a finite 
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but random number W (with W > 1) of renewals and positive i.i.d. interrenewal times 
Di, i = 1, . . . , W — 1, independent of W . The terminology of finite renewal processes here 
follows that of [9], Example 5.3(b), page 125. We denote by D the variable with a com- 
mon distribution of Di. We also let V denote the total duration of the finite renewal 
process defined by V = X)i=i 1 Di (if W = 1, then V = 0). Suppose now that each re- 
newal, independently of the others, is sampled with a fixed probability q £ (0, 1) to form 
a sampled finite point process. We let W q denote the total number of sampled renewals, 
D q ^, i = 1, . . . , W q — 1, the sampled interrenewal times and V q = 53i=i 1 D q .i the total 
duration of the sampled finite point process. 

The following notation will be used many times throughout the paper. For a discrete 
random variable X, its distribution or probability mass function (p.m.f.) will be denoted 
by fx{x) = P(X = x). For example, we shall write fw{w), fw q (w), etc. For a continuous 
random variable Y , its distribution function will be denoted by F Y (y) = P(Y < y) and 
its Laplace transform (LT) will be denoted by F Y (v) = J c~ vy F Y (dy), v>0 (v e 
We shall also use the conditional distribution functions F Y \ s {y) = P(Y < y\W q = s) and 
F Y \ s +(y) = P(Y < y\W q > s). So, for example, we shall write Fjj, Fn, F^ q .\ [S , etc. In 
several instances, we shall use analogous notation, but where Y is replaced by a multi- 
dimensional. 

For a sequence a = {a rl } Iie N i where No = N U {0} and N is the set of natural num- 
bers, we denote its formal power series or its z-transformation by G a (z) = X^^Lo "' 2 " - 
Conversely, any such formal power series is associated with a sequence. When a stands 
for a p.m.f. fx, we shall also write Gx in place of Gf x . Since a sequence a will not 
necessarily be nonnegative, we shall also use the notation a + = {a+}„ 6 N , defined as 
a n = I a n | , and Q a (z) = G a + (z) = X^Lo l a «l z ™- I n several instances, we shall use a multidi- 
mensional power series G a (zi,..., z m ) = X^=o ' ' ' S^=o a n z ™ 1 ■ • • z r " m associated with 
a = {a n }neN™ and n= (m,...,n m ). 

We shall also use the following common operations on sequences (or their formal 
power series) a = {a„}„ e N an d b = {b n }„^n - The composition of a and b will be 
denoted aob and is defined by its formal power series as G ao b{z) = G a (Gb{z)). The 
A;th derivative of a = {a n } n< =fi will be denoted by = {ai fe '}„ e N and is defined by 
G a {k) (z) = d k G a (z)/dz k = J2^=k a n n i n — l)---(n — k+ l)z n ~ k . The reversion of a se- 
quence a = {a n }neN will be defined as a sequence b = with its formal power 
series satisfying Gb{G a {z)) = G a (Gb{z)) = z. The reversion is defined for any sequence 
a = {flrijneN with a\ =^ 0. As usual, the symbol * will stand for convolution in either 
the discrete or continuous setting. With all these notions for sequences and their formal 
power series, we follow the nice monograph of [16]. 

3. From finite renewal process to sampled finite point 
process 

In this section, we study several characteristics of the sampled finite point process in terms 
of the original finite renewal process. We first briefly consider the number of sampled 
renewals and then turn to sampled interrenewal times. 



Probabilistic sampling of finite renewal processes 1291 
3.1. Number of sampled renewals 

The relation between the probability mass functions of the number of sampled renewals 
and the number of original renewals is given by 



fw t oo = E p( - w « = s \ w = w ) p ( w = w ) 

W — S 

= E )q s (l-q) w - s fw(w), s>0. 



(3-1) 



The function G\y q (z) is given by 

c 

G Wq ( Z ) = z s E ( ! M 1 - i) w ~ s fw{w) 



oo oo 

' w 



s—0 w—s 



= YfwH(zq+(l~q)) W (3.2) 

W — l 

= G w (zq+(l-q)), \z\<l. 

The same relations (3.1) and (3.2) also appear in [17] and will be used in Section 4.1 to 
obtain inversion results. 

3.2. Sampled interrenewal times 

Given that s renewals have been sampled, the next result characterizes the LT of the ith 
(i < s) sampled interrenewal time in terms of the LT of the original interrenewal times. 

Theorem 3.1. The LT of D q i given W q = s can be expressed as 

F DqMs (v)=G As (F D {v)), l<i<s,v>0, (3.3) 

where A s = {j4 Sjm } m£ N and A s _ rn denotes the probability that the number of original 
renewals between the ith and (i + l)th sampled renewals is equal to m — 1 given W q = s, 

w — rn' 
s-1 



^J^JT) E /HrH (.:;") (1-9)-'. (3-4) 

jw i\ I w=s +m-l 



Proof. For 1 < i < s and t > 0, we have 

oo 

P(Dq,i <t 7 W q = s) = fw(w)P(D q ,i <t,W q = s\W = w) 



r (3.5) 

= E fw(w)P(W q - s\W = w)P{D q . l < t\W = w,W q = s). 
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Let M denote the number of original renewals not sampled between the ith and (i + l)th 
sampled renewals plus 1. If the number of original renewals is W = w and the number of 
sampled renewals is W q — s, then M can take the values 1,2, ...,w— s + 1, and 



w— s+1 



P(D qii <t\W = w,W q = s)= F% n (t)P(M = m\W = w,W q = s). (3.6) 



By considering the location of the ith sampled renewal in the total number of renewals 
w along with the possible distinct locations of the sampled renewals before the ith and 
after the (i + l)th sampled renewals, we obtain that 

w — m— (s — i— 1) 



P(M = m\W = w,W q = s)= ]T 



l=i 



w 1 

i — 1 J \ s — i — 1)1 \ s , 

(3.7) 

w — rn\ 1 1 w > 



s — 1 / / \ s 

where / is the location of the ith sampled renewal and the last equality follows from the 
identity 



x — b 



E(!:iVV) = L:J' «>1,6>0,o + 6<*. (3.8) 



. a — 1/ \ b J \a + b 

l— a 

By using (3.5)~(3.7), we deduce that 



fw g (s) 

W — S + 1 



^ > > w =s m=l x 7 

oo 

£ A., m Fg»(t), 



m—l 



where A S: , n is given by (3.4). The relation (3.3) follows by taking the LT in (3.9). □ 



Theorem 3.1 implies that, when conditioning on W q = s, the distribution of the «th 
(i < s) sampled interrenewal time depends, in general, only on s and not on i, and 
hence that the times between consecutive sampled renewals are identically distributed 
conditionally on W q . In contrast to the original finite renewal process which assumes 
independence of W and Z?i, the sampled quantities W q and D q ^ are dependent. 
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Other forms of conditioning on the number of sampled renewals are possible. For 
example, from (3.3), the LT of D q i given W q > s can be expressed as 

oo 

Fd^\ s+ (v) = £ F DqM .,(v)P(W q = s'\W q > s) 

s' — s 

= P(W > s) £ £ A s ,, m F D (vrf Wq (s') (3.10) 

^ ^ ' S'—S 771— 1 

= G As+ {F D (v)) 1 

where A s + = {A s + m }meft an d A s + m is the probability that the number of original 
renewals between the ith and (i + l)th sampled renewals is equal to m — 1 given W q > s, 



OO OO / \ 



^9 f ("; M Ka-.) 

v v — y u;=m+s-l \ s'=0 v 



(3.11) 



w — m — s 



In the rest of the paper, we shall focus only on the conditioning W q = s, but similar 
results can be derived for other forms of conditioning such as W q > s. 

The next result gives the joint LT for a finite number of sampled interrenewal times. 
We shall use this general result below to investigate dependence between sampled inter- 
renewal times (see Section 3.2.1); see Appendix A for the proof. 

Theorem 3.2. The joint LT of D 9 .„ = {D q ^ , . . . , D q ^ n ) with 1 < i\ < • • • < i n given 
W q = s, s > i n , can be expressed as 

F Dq , n \ s (v)=G Bs (F D (v 1 ),...,F D (v n )), v = („ 1 ,...,«„)elf, (3.12) 

where B s = {B s ,m}m6N™ with m= (mi,...,m„) and B s>m denotes the probability that 
the number of renewals between the ijth and (ij + \)th sampled renewals is equal to rrij — 1, 
1 < j < n, given W q = s: 

L- OO / 

w — m 
s — n 



B s , m = j^ J2 fwH[ )(l-qT- s (3.13) 



with m = mi + • ■ ■ + m n . 

By Theorem 3.2, the joint distribution of a subset of sampled interrenewal times given 
W q = s depends only on s and not on the indices of the sampled times considered. 
Another relation of interest is between the duration of the sampled finite point process 
and the original interrenewal times; see Appendix A for the proof. 
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Proposition 3.1. The LT of V q given that W q > 2 can be expressed as 

F Vql2+ (v) = G c (F D (v)), v>0, (3.14) 

where C — {C m }rneN and C m is the probability that the number of renewals between the 
first and last sampled renewals is equal to m — 1, given W q > 2: 

2 00 

° m= P(W >2) £ fw(w){w-m){l-qy- m -\ (3.15) 

^ 'u;— m+1 

Since the LT of the duration of the original finite renewal process V = 1 given 
W>2, can be expressed as 

Fv\w>2(v) = E ^ + 

we expect that 

oo 

nF v \w>2(v) n , (3.16) 

n=l 

where .D = {-D„}„ 6 n is the reversion of {fw( w + 1)/P(W > 2)} u , e pj (defined in Section 
2). By plugging (3.16) into (3.14), we can obtain an expression for the duration V q of the 
sampled finite point process in terms of the duration V of the finite renewal process. 

Also, note that, equivalently, the relations (3.3), (3.12) and (3.14) can be expressed in 
terms of the distributions functions (see (3.9), and relations (A. 4) and (A. 7) in Appendix 
A, respectively) or via the characteristic functions. 

3.2.1. Examples 

In the following, we examine characteristics of the sampled finite point process for par- 
ticular distributions of W . 

Geometric distribution 

Suppose that W is a geometric random variable with parameter c, that is, fw(w) = 
c u,_1 (l — c), w > 1, c6 (0, 1). Substituting this fw{w) into (3.4), we obtain, after straight- 
forward calculations, that 

A s , m = (c(l - ?)) m_1 (l - c(l - ?)), m > 1. 

Therefore, A s is the p.m.f. of a geometric distribution with parameter c(l — q) which 
does not depend on s. From (3.3) and using the generating function of the geometric 
distribution, we have 

~ (l-c(l-q))F D (v) 

I'd \s(v) = . 

^ l-c(l-q)F D (v) 
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Now, substituting fw{w) into (3.13), we conclude, after some algebra, that 

B s>m = A s , mi •••A a>mn , m = (mi,...,m„)eM n , 
and hence, from (3.12), 

^D 9 ,„| 8 (V) - [[ ~ ~ • 

i= i 1 - c(l - q)F D (vi) 

The latter expression shows that when W is geometrically distributed, the sampled inter- 
renewal times are independent and identically distributed (i.e., the sampled finite point 
process is a finite renewal process). The LT of V q in (3.14), where C is now the p.m.f. of 
a geometric distribution with parameter 1 — c, simplifies to 

F Vq \2+(v) - 



\-{l-c)F D {v) 



Pareto distribution 

Suppose now that W is a discrete Pareto distribution, that is, fw{w) = w~ a ~ 1 /f(a + 1), 
w>l, a > and £(z) = X)^?=i u ' _z * s the Riemann zeta function. Unlike in the previ- 
ous example, closed forms for LT's of the sampled interrenewal times are not available. 
Nevertheless, we can show that for a particular example, the sampled interrenewal times 
are not conditionally independent. Taking s = 3, i\ = 1 and i 2 = 2, the random variables 
D q i and D q ^ are independent given W q = 3 if and only if 

G Bs (F D («i) , F D (v 2 )) = Ga 3 (Fd (vi ))G As (F D (v 2 )) (3.17) 

for all (2/1,2/2) G R+- Since Fjj(y) is continuous in y, it takes all the values in (0, 1). Then, 
(3.17) implies that Gb 3 (wi,W2) = G a 3 {w\)G a 3 {w2) for all W\,W2 G (0,1) and hence 

-E"3,m = ^3,mi^43,m 2 f° r au m = { m ii m 2) <= . The coefficients of the term Fd{vi)Fd{v2) 
on the right- and left-hand sides of (3.17) are, respectively, 

A 2 = 9(Li a _i(l - q) - 3 Liq(l - q) + 2 Li a+1 (l - q)) 2 
3,1 Li Q _ 2 (l - q) - 3Lia_i(l - q) + 2Li a (l - q) 

and 

6(1 - g + Liq(l - g) - 2Li a+ i(l - g)) 
3 ' (M) Li Q _ 2 (l - 9 ) - 3Lia_i(l - q) + 2Li a (l - g) ' 

where Li„(a) = X^=i cl w /w n , |a| < 1, is the polylogarithm function. Since A| 1 7^ B3 (1,1), 
the random variables D q \ and D 9l 2 are not conditionally independent. This is also why 
we use the term "sampled finite point process" throughout the paper instead of "sampled 
finite renewal process" . 
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Heavy-tailed distributions 

Consider the case of a heavy-tailed distribution for W, in the sense that 

fw(w) ~ caw-**' 1 (3.18) 

as w — > oo, where a S (1,2) and c > 0. Such distributions are common models for long 
flow sizes in network traffic studies (e.g., [6]). As in the example with Pareto distribution 
above, closed forms for LT's of the sampled interrenewal times are not available under 
(3.18). We can nevertheless provide a number of interesting qualitative results concerning 
the case (3.18). 

The relation (3.18) implies that 

P(W > w) ~ cw- a (3.19) 

as w — s- oo. As indicated in [17], page 70, by the results of [2], page 333, this implies that 

P(W q > w) ~q a cw- a , (3.20) 

that is, W q is also heavy-tailed with the same parameter a. (Note that (3.20) does not, in 
general, imply that fw q (w) ~ q a caw~ a ~ 1 .) Also, note that in the case (3.19), the total 
duration V is expected to be heavy-tailed, even for light-tailed interrenewal times £>j. 
Indeed, by Robert and Segers [21], Theorem 3.2, if (3.19) holds and 

P(D 1 >t)=o(t~ a ), (3.21) 



then 

P(V >t)~p(]V> J - c(EDi) a t- a , (3.22) 

that is, the tail of total duration is dominated by that of W. 

On the sampling side, the total duration is also characterized by heavy tails. Indeed, 
by Proposition 3.1 above and [21], Theorem 3.2, 

P(V q >t)~p(c> J-^j ~ co(EDi) a t- a , (3.23) 

where a random variable C is such that P(C = m) = C m , where C m is defined in (3.15), 
as long as (3.21) holds and 

C m ^CQamr a - 1 . (3.24) 
To show (3.24), observe that for large m, using (3.18), 

2 00 

C qca y- w - a - 1 (w-m)(l-q) w - m - 1 

P(W„>2) ^ v A H> 

y 7 w—rn-\-l 
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2 oo / , v — a — 1 

_ Q _i <Tca y-^f k\ fc _ x 



> 2) j^j V "» 



m 



-a-l 



q 2 ca 



HW q >2, h 



Y^Kl-qf- 1 (3.25) 



3C 



q2ca 1 d E(!-^ 



P(W,>2)^ d<? fc= 



-a-l « 2 ca / d 1\ _ a _ 1 cot _ a _ 1 

= m — ; r = m — ; =: m cna. 

P(W,>2)V <W P(W g >2) 

Note also that the argument (3.25) above does not apply to the sampled interrcnewal 
times. For example, for the first sampled interrcnewal time -D q ,i, the analogous argument 
would show that, for coefficients in (3.4), 

V-cfl-fm-"- 1 (3.26) 

as m — > oo, which is not heavy-tailed, in contrast to (3.24). 



4. From sampled finite point process to finite renewal 
process 

In this section, we study inference of the original distributions of the number of renewals 
and interrenewal times. 

4.1. Distribution of number of renewals 

We are interested here in estimating the p.m.f. fw{w) from i.i.d. observations of W q . We 
revisit a nonparametric estimator of fw{w) introduced in [17] and clarify several issues 
surrounding its use and properties. A number of open questions will also be raised. 

Estimation of fw{w) is based on a theoretical inversion of the relation (3.1) which will 
be discussed first. The relation (3.2) can be written as Gw{z) = Gw q (q~ 1 z + (1 — q^ 1 )), 
which has the same form as the original relation (3.2) when W and W q are exchanged, 
and q is replaced by q^ 1 . In view of (3.1), we would then expect that 

OO / 

S 



f w ( w ) = J2 (J (<rTa - q~ 1 r~ w fw q (s) 

s—w ^ / 
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Holm and Veitch [17] claim that (4.1) holds when q G (0.5,1) (with this choice, note 
that g _s (l — q) s G (0, 1) in (4.1)). As the following elementary result shows, this is not 
a necessary condition. The relation (4.1) also holds for q G (0,0.5] as long as fw q {w) or 
fw{w) decays to zero fast enough; see Appendix A for the proof. 

Proposition 4.1. If 

E Q (1 Z f I1 fwM = f] (")2"-"(l - q ) w - n fw(w) < oo, n>l, (4.2) 

s—n ^ ' w—n ^ ' 

then the relation (4-1) holds. 

For example, for geometric W satisfying fw{w) = c"' _1 (l — c), w > 1, c£ (0,1), the 
condition (4.2) holds when 2 — (2 — e)q < c" 1 , where e > is arbitrarily small and where 
we have used the fact that (™) in (4.2) can be bounded by w n up to a multiplicative 
constant. Note also that (4.2) always holds for q £ (0.5, 1). When q s (0, 0.5], on the other 
hand, a number of distributions fw{ w ) OI interest, such as heavy-tailed distributions 
(Section 3.2.1), do not satisfy (4.2). In these cases, and generally for q£ (0,0.5], the 
relation (4.1) needs to be modified by a procedure used and referred to as an analytic 
continuation in [17], page 71. The procedure is defined below. We should note that this 
procedure is viewed below as a convenient algebraic trick that makes series converge (see, 
e.g., the proof of Proposition 4.3) rather than as a suitable analytic continuation, which 
is a complementary viewpoint followed in [17]. 

Let zq = 1 — q and pick an arbitrary sequence Zk, k = 1, . . . , I, such that 1 > zq > z\ > 
■ ■ ■ > Zi-i > z\ — and Zk G Ck-i — {z : \z — Zu-\\ < 1 — Zfe-i}, k = 1, . . . ,1. For a sequence 
x = {i„}nEN, define formally and recursively sequences T^ k '(x) = {T^* 1 (a;)„}„ e N, k = 
1 , as 

T^ k \x) n = Y,[ n )T {k - 1) (xUz k ~z k _ 1 y- n , n>l, (4.3) 

i—n ^ ^ 

where T^\x)i = x-Jq 1 . It is also convenient to define the mapping underlying (4.1), that 
is, S(x) = {S(x) n } ne tn, where 

00 /A (— 1 V _n 

i—n ^ ' 

The following elementary result relates S(x) and T^'ix) when x satisfies a natural con- 
dition. 



Proposition 4.2. If a sequence x = {x n } ne p$ satisfies 

'A (1-9) 
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then 

T®(x)=S(x), (4.6) 
where T^ l '(x) and S(x) are defined in (4-3) and (4-4); respectively. 

The next result formalizes the fact that fw can be obtained as T"'(fw q )- Note that 
this is true without any assumptions on q and fw- 

Proposition 4.3. For any q <G (0, 1) and p.m.f. fw, we have 

fw=T {l \f Wq ), (4.7) 

where T"> is defined in (4-3). 

With theoretical inversion formulas (4.1) and (4.7), we can now turn to estimation. 
Let W q .k, k = l,...,N, be i.i.d. copies of the variable W q and 



1 N 

M s ) = ^J2 1 {w q , k = s }, s>0, (4.8) 

k=l 

be the empirical p.m.f. of fw q , where 1,4 denotes an indicator function of an event A. 
Note that we assume, in particular, that the event W q ^ = can be observed in practice 
(or, equivalcntly, N and W q .k > 1 can be observed in practice). In the network traffic 
context, we naturally observe only W qi k > 1- The total number of flows N is deduced 
from the additional information in the sampled packet headers (e.g., [10]). (To be more 
precise, N is actually estimated, but we suppose it to be known for the sake of simplicity.) 

In view of (4.7), it is natural to introduce the following nonparametric estimator of 
fw- 

fw(w)=T^(f Wq ) w (4.9) 
= S(fw q ) w = Y,[ w ) „s (l-g)'""Av.(«). ( 4 - 10 ) 



(I- 



where T®(x) and S(x) are defined in (4.3) and (4.4), respectively. The first equality in 
(4.10) follows from Proposition 4.2 since only a finite number of the /w,(s)'s are nonzero. 

We will next show that the estimator (4.9) is asymptotically normal under suitable 
assumptions. The suitable assumptions arc quite strong, but we do not expect that 
they can be weakened much, as explained in Proposition 4.4 below and a discussion 
surrounding it. For w > 1, we also let 

R ^ = EQ ^ — ^ (s) (4 - n) 
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k 



l) s , (4.12) 



where the second equality follows from (3.1). 
Theorem 4.1. Suppose that 

R q>w <oo, w>l, (4-13) 
where R q , w is defined in (4-11)- Then, as N —>oo, 

{VN(fw{w) - fw(w))} weN {S(Q w } w&i , (4.14) 

where the convergence is in the sense of finite- dimensional distributions, S(x) is defined 
in (4-4) an d £, = {Cs}sgn is a zero-mean Gaussian process with the covariance structure 

E£ Sl £s 2 = fw q (si)S SuS2 - fw q {s\)fw q {s 2 ) (4.15) 

(and, as usual 7 S Sl-S2 — 1 if S\ — S2, and — if S\ ^ S2, this being the Kronecker symbol). 
In particular, the limiting variables S(£) w are zero-mean, Gaussian and have the variance 

ES(Oi=R q , w -fw(w) 2 . (4.16) 

Proof. We only consider the convergence (4.14) at fixed w > 1. Note that, using Proposi- 
tion 4.2, VN(f w {w)- f w (w)) = S(VN(fw q - fw q ))w For fixed j > w and x = {x n } n€ n, 
define 

i—w ^ ' 

Using [1], Theorem 3.2, page 28, it is enough to show that: 

(i) S(VN(f Wq - f Wq ))j, w 4 S(0 jtW as N -> oo; 

(ii) S(C)j, w -4 S(£) w as j -> oo; 

(iii) for any <5 > 0, 



limsuplimsupP(|5(VJV(/ w , - fw t )) j>w - S(VN(f Wq - f Wq ))J > 5) = 0. 



The convergence in (i) is elementary since {vN(fw q (s) — fw q {s))}sefi converges to £ = 
{CsjseN m the sense of finite-dimensional distributions and since, for fixed j, S(x)j tW 
involves only a finite number of elements of x = {.t„}„ 6 n. 

The convergence in (ii) can be proven in a stronger sense, that of almost sure conver- 
gence. For this, observe that in the sense of finite-dimensional distributions, 

{&} seN = {B(F Wq (*)) - B(F Wq (s - l))}, gN + {f Wq (s)B(l)} seN 

(4.17) 

— + i£ (2) l 
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where B = {-B(£)}te[o.i] is a standard Brownian motion. It is then enough to show almost 
sure convergence of the corresponding terms for £W and Doing this for £^ 2 ) is 
elementary and thus we do so only for Note that 

{6 1) } s ^*{fwM /2 Vs}seN, (4.18) 

where r] s arc i.i.d. -/V(0,1) random variables. By the three series theorem, S{^')j w — > 
S^^^w a.s. as long as the condition (4.13) holds. 

For the convergence in (iii), note that the probability in (iii) can be expressed and 
bounded as 



PI 



s=j+l 



>s 



oo / \ 2 
S 



E Q' {1 ~f! s ~ w) E(VN(f Wq (s)-f Wq (s))) 2 



s=j+l 



j+l<si<s 2 



«i\ /s 2 \ (l-g) s i+^- 2 - 



2 ri \2{s—w) 



^ /sVll-r""' 1 ,, , x , , x2x 



S=J+1 

J + 1<S1<«2 V 7 V 7 ^ 

where Qj = Y^£Lj+i (^,) 2 ^— /w„ («)• It remains to observe that Qj — s- as j — >• oo, 

by the assumption (4.13). The proof of (4.16) is now elementary. □ 

Note that the condition (4.13) is quite strong and, for example, does not allow for 
heavy-tailed distributions when q£ (0,0.5]. When this condition does not hold, since 
VN(f w (w) - fw(w)) = TW(VN(fw 9 - fw q ))w, we may think that the limit of the left- 
hand side of (4.14) should be 

T«\Z) = {T«\O w } weW (4.19) 

where the process £ is as in Theorem 4.1. This, however, is not expected to hold. In 
fact, without the condition (4.13), the expected limit (4.19) is not even well defined, as 
the result below shows. Consider the case of I = 2, z-2 = and T^ 2 )(£) for simplicity (the 
argument can also be extended to general /). With the representation (4.17), the term 
7 1 (2)(^( 2 )) i s we U defined by arguing as in the proof of Proposition 4.3. We will show that 
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without the condition (4.13), T( 2 )(£W) is not well defined, in the following natural sense. 
Note that 

s—n ^ ' 

is well defined, where we take £s = fw q {s) 1 ^ 2 rj s with r] s as in (4.18) and let 

k—w ^ ' 

The proof of the following result can be found in Appendix A. 

Proposition 4.4. The condition (4-13) is necessary for T^ 2 \£y^)j^ w to have a limit in 
distribution as j — > oo . 

We shall conclude this section by considering the convergence (4.14) in a suitable space 
of sequences. We let loo, a = { x = {^n}neN : Halloo, a := sup n > a n |;En| < oo}. See Appendix 
A for the proof. 

Proposition 4.5. //, for b' > 0, 

EM( S ) I/2 (^) S <-, (4-20) 

then the convergence (4-14) a ^ so holds in the space lao,b for any < b < b' . 
Note that the condition (4.20) is stronger than (4.13). 

Theorem 4.1 and Proposition 4.5 suggest several possibilities for confidence intervals 
(CI's) of fw- For example, following Theorem 4.1, we can define the 100ct% CI of fw(w) 
for fixed w as 

(fwH - ZaN-^iESiOD^Jwiw) + ZaN^ES®*) 1 ' 2 ), (4.21) 

where ES^)^ is defined as in (4.16) by replacing fw in R qiW and fw(w) by fw q i an d 
z a is the lOOcrth percentile of |7V(0, 1)|. Another possibility is to use Proposition 4.5 and 
set the 100a% confidence "interval" (set) across all w simultaneously as 

{f--\\fw-f\\oo,b<N-V 2 q(a)}. (4.22) 

Following [13] and [3], the quantity q(a) should be taken as the lOOath percentile of 



(h,...,i N )£{l,...,N} N \ 



N 



S[N' 1 J2 1 {w q ., k =-})-S(fw q ) 



k=l 



>,6/ 

(4.23) 
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which can be viewed as a bootstrapped version of the probability 

R N (z) = P(VN\\S(f Wq ) - S(f Wq )\\oo,b <z) = P(VN\\fw ~ fw\\oo,b < z). (4.24) 

The choice of q{a) is justified if we can show that i?jv and Rn arc asymptotically equiva- 
lent in distribution in a suitable space of functions. This could be shown by following the 
approach found in the proof of [13], Proposition 3.15. The proof will not be given here. 
We should also note that the discussion above assumes that the conditions of Theorem 
4.1 and Proposition 4.5 are met, that is, fw{w) or fw q {w) has a sufficiently fast decay. 
In practice, these conditions cannot be verified and caution should be exercised with the 
resulting CPs. Another practical problem with Proposition 4.5 and a related bootstrap 
procedure is that b is not known in advance. In the simulation study found in Section 
5.1 below, we shall explore CPs given by (4.21) and also use a bootstrap procedure, but 
where |j ■ Ho^f, is replaced by || • j|; with \\x\\i =sup 1<n< ; \x n \ for some fixed I. 

4.2. Distribution of interrenewal times 

Here, we are interested in estimating the original distribution function Fd of interrenewal 
times. We focus on nonparametric estimators, in the spirit of Buchmann and Griibel [5] 
and B0gstcd and Pitts [3], and their basic properties (such as asymptotic normality). 

Estimation of Fd is based on a theoretical inversion of the relation (3.3). (As mentioned 
in Section 3, forms of conditioning other than that on W q — s are possible and could be 
dealt with by following the approach developed below.) Note that, from (3.3), we would 
expect 

F D (v)=G as (F Dg . ls (v)), (4.25) 

where a s = {a s . n }„ 6 N is the reversion of the sequence A s = {^4 s ,n}neN (see Section 2). In 
terms of distribution functions, the relation (4.25) can be written as 

oo 

FD = Y,a., n F% Aa . (4.26) 

n=l 

The following result provides sufficient conditions for (4.26) to hold and follows directly 
from [3], Theorem 1. It uses the following notation. Let r(G A ,) and r{G as ) be the radii 
of convergence of the corresponding power series G As and G as (in the sense found in, 
e.g., [16]). By B0gsted and Pitts [3], Proposition 1, there exist 

a(G As ) e (0,r(G As )}, a(G a .) G (0,r(G o J], (4.27) 

such that \w\ < a(G aa ) implies that there exists a unique z where \z\ < u{Ga,) and 

G As (z)=w, z = G aa (w). (4.28) 

We also need a suitable space of functions. For r > 0, let _D T [0,oo) = {/: [0,oo) M- 
K:||/||oo,r:=sup t>0 e- rt |/(t)|<oo}. 
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Theorem 4.2 ([3], Theorem 1). Let t > be such that 

F Dq . ls (r)<a(G as ). (4.29) 

The series on the right-hand side of (4-26) then converges in D T [0,oo). If, in addition, 

F D {T)<o-{G Aa ), (4.30) 

then the relation (4-26) holds. 

As discussed in [3], the conditions (4.29) and (4.30) always hold for large enough r. 
We now turn to estimation based on the inversion (4.26). It is natural to consider the 
estimator 

oo 

F D = Y,<nF* D l iW (4.31) 

n=l 

Here, a s = {a Si „}„ e N is the reversion of the sequence A s = {A Sj „}„ 6 n defined as 

— fwM( ")(l-qT- s , (4.32) 

fw t (s) w=s+n _ x V s L J 

where fw q and fw are given in (4.8) and (4.10), respectively (i.e., A s is defined as in (3.4), 
but where fw and fw are replaced by their sample counterparts). For the distribution 
function estimator, we take 

1 N 

F D q ,i\s(t) = ? . ^2 1 {D q , z , k <t,W q , k =s}, (4-33) 

where D q ^^ are i.i.d. observations of D q ^. An important difference between (4.31) and 
similar estimators considered in [3, 5] is that (4.31) involves an estimator a s of a s (whereas 
a s was assumed to be known in these related works). This obviously makes the analysis of 
the estimator (4.31) more involved. Also, note that the estimator Frj in (4.31) is defined 
for fixed s and i. In addition, the estimator involves averages which are conditional on 
W q = s. In particular, the expressions (4.32) and (4.33) are only defined for fw q (s) ^ 0. 

We will show below that the estimator (4.31) is asymptotically normal under suitable 
assumptions. Note that the estimator (4.31) can be viewed as a functional of A s (via a s ) 
and Fjj ,| s . We first need a suitable result on the asymptotic normality of the latter quan- 
tities. Let (C = {CnjnGN, Z = {Z(t)} t >o) be a zero-mean Gaussian process characterized 
by the following: 

C„ = -|% + 7 4t £ S(O w ( W -f\(l- q r-s, n>l, (4.34) 

fwM fwM^s-! \ S - l J 

m = _ vFD 9 As{t) + _^_ B{f {s)F {t)) t > Qj (4 35) 

fw q (s) fw q (s) 
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where S(£) appears in (4.14) of Theorem 4.1, rj is a zero-mean Gaussian variable with 
Erj 2 = fw q (s) — fw q ( s ) 2 an d B is a standard Brownian bridge such that 

E V S(0 W = 1 {S > W} f*) (-l) s - w q- s (l - q) s - w fw q (s) - fw q (s)fw(w), 
E V B(f Wq (s)F Dq . ]s (t)) = f Wq {s){\ - fw q {s))F DqAs (t), 

/ s\ (-iy- w 

ES(O w B(f Wq (s)F Dq ils (t)) = 1 {S > W} M 1 ± (1 - q) s - w fw q (s)(l - f Wq (s))F Dqils (t). 

The proof of the following result can be found in Appendix A. 
Proposition 4.6. Suppose that for some zq > 1, 

oo 

^y/R^O—q^z^w'Koo, (4.36) 

W — 1 

where Rq jW is defined in (4- 11). Then, 

VN((A s ,F Dq . {s ) - (A s ,F Dq . ]s )) A (C,Z) (4.37) 

with the convergence in the space (1 OCiZo ,Dq[0,oo)), where the limit (C,Z) is characterized 
by (4-34) and (4-35) above. 

We are now ready to prove the asymptotic normality result for the estimator Fp in 
(4.31). In addition to the notation introduced before Theorem 4.2, we shall also use the 
following. For a formal power series G(z) = X^^Lo a ™ z ™i ^ 

u G (a)= inf \G(z)\. (4.38) 

\z\=cr 

If r(G) denotes the radius of convergence of G as before, we also let 

r (G) =sup{cr :cr <r(G),u G (a) >0 for 0<er<a- o }- (4.39) 

The notation (4.38)-(4.39) follows that found in [16]; also recall the notation Q a {z) from 
Section 2. 

Theorem 4.3. Suppose that the condition (4-36) holds with zq > 1. Also, suppose that 
for some r > 0, (4-29) and (4-30) hold, and 

Ga s (F DqM s(r)) < z , (4.40) 
9A 3 {GaAFn qi \s{r))) < max ^g 4 »- (4.41) 

0<a<r o (G As )^z o 
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Then, 



Vn(f d -f d )^x 



(4.42) 



with the convergence in the space D T [0,oo). The limit X is a zero-mean Gaussian process 
which can be expressed as 



X = - * (C ° ^)) n F* D l i]s + a ^nF* D ^ * Z, 



(4.43) 



where (C,,Z) is the limit process appearing in (4-37). 



Remark 4-1- The conditions (4.40) and (4.41) hold for large enough r. The presence 
of zo in (4.36) and (4.40) is of technical interest: if fw{w) has faster decay, then zq could 
possibly be taken larger in (4.36) and hence r could be taken smaller in (4.40). 

Proof of Theorem 4.3. For notational simplicity, we will not write the index s in 
A s , a s , A s or a s . By the Skorokhod representation theorem and using Proposition 4.6, we 
can suppose that 

VN((A,F Dqzls )-(A,F Dqtls ))^((,Z) a.s. (4.44) 
in the norm (|| • ||oo,2 > II ' lU.o)- Write 



N(F D -F D ) = \^^a^. |8 - «n*Sk| a 

n—1 n—1 

oo oo 

= ^^(a B - a„)(F^ i|s - F* D n qAs ) + \^V^(a n - a„)i^\ |s 

n—1 n—1 

oo 

+ VNY, an{F* D n qA s ~ F*dUs) =■ T 1 +T 2 + T 3 . 

n=l 

By B0gstcd and Pitts [3], Theorem 2, we have 



* Z a.s. 



in the norm || • ||oo.r, which is the second term in the limit (4.43). 

We next show that the term T 2 converges to the first term in the limit (4.43). Observe 
that by using [5], Lemma 6(b) (see also the inequality used in the proof of their Lemma 
7), we have 



\F 



2||oo,t " 



TV V(a„-a„)F*™ |s - V(-a« * (C°a))„i^\ 
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oo 

< Y,\^(^-a n ) + (a^U(Coa))J\\F* D l ils \\ 00!T 

71 = 1 

OO 

< C]T| VF(a„ - an) + (a™ * (C o a)) n \F DqAs (r) n . 

n=l 

Writing VN(a-a) +a<< 1) * (( o a) = (y/N(a o A - I) + (a^ o A) * () o a and using the 
inequality G XO y(z) < Sx^j/W); wc further obtain that 



ll^lU.r < C^\(y/N(ao A -I) + (a« o A) * C)„l(^a(^, i | s (r)))" 

n=l 

<C(i?i+i? 2 + i?3 + i?4), 

where, with z x = G a {F Dq .\ s {T)), 

OO 

i?i = ^ I (y/N (ao (A + A- A) -aoA) - (a (1 > o A) * (Vn(A - A))) n \z?, 

n=l 

oo 

R2 = ]T|((a (1) ° * (VN(A - A)) - (a« o A) * (ViV(A - A))) n \z?, 

00 

n=l 

00 

^4 = EK( fl(1) A ) * (ViV(l- A)) - (a^ o A) * C)„K- 

n=l 

We will next show that i?^ — > a.s., fc = 1, 2, 3, 4. 

For the term by using Proposition B.l in Appendix B, we first observe that 

Ri < ^ E((« (2) ° (A + + (A- A) + )) * (A A) + * (A A) + ) n z- 



n=i 

1 



2VN 



Gam (Ga( Z i) + ^-a( Z i))(^v / W(A-1)( 2; i))" 



By (4.44), and since z\ < z by the assumption (4.40), we have Q /zyY-l-A)^ 1 ) ~~ Gq{z\) 
a.s. Next, we want to show that 

Gum (G S (zi) + G A _z{zx)) < C a.s. (4.45) 

for some random constant C . For this, we further examine the radius of convergence of 
C/ a <2). By the Cauchy-Hadamard formula (see, e.g., [16], Theorem 2.2a, page 77), 

r(G z m) = r{Gt) = r(G^). 
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By applying the inequality after the proof of [16], Theorem 2.2b, page 99, we then have 
r{Ga(v)> max u Gs (a)> max i/g.-M, 

0<a<r„(G A ) 0<a<r o (G A )/\z o 

where the notation ro(G), vq was introduced above. Since G-Az) converges to G a{z) 
a.s. and uniformly on \z\ < Zq, we have 

max vg~{(j)^ max vg a {o) a.s. 

0<a<r o (G A )/\z o A 0<<J<r o (G A )/\z 

The relation (4.45) now follows from the relations above, the fact that G A (z\) + 
G A _ A {zi) — > Ga{zi) a.s. and the assumption (4.41). We can now conclude that R% 
a.s. 

For the term i?2, note that for fixed K>1, 



r 2 < x> (1) -« (1) )ni(^i)r^-A)(^) 



n=l 
/ K 



< X)K S(1) -° (1) )nl(^l)) n + E I^K^l))" 
\n=l n=K+l 

oo \ 

+ E k 1) l(^i)) n p^_^(«i)=:i2 2 ,i + ^«,2 + « 



2,3- 



For a fixed K, i?2,i — > a.s. by using the a.s. convergence of A in (4.44). On the other 
hand, arguing as for the term R\ above, we can make sure that i?2,2 and i?2,3 are 
arbitrarily small for large enough K. For the term i?3, by using Proposition B.l, 



< Y, ( a + ( A + + (A-A)+)*(A- A) + ) n z?Q^ {A _ A) ( Zl ) 

n=l 



-^=g a (2 ) {gA{zi) + g A _ A {z 1 )){g^ {A _ A) { Zl )f 



fN 

and we have R3 — > by arguing as for the term i?i . By using the bound 

Ri < G a m (QA(zi))g^ A _ A) _ c {zi), 

we get that R4 — > a.s. 
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Finally, we shall prove that the term T\ is asymptotically negligible. Note that, as in 
the proof of [5], Proposition 8, 

oo 

71=1 

OO 

< \\VN(F Dg . ls - F , i4 | 8 )|| 00i o E I 3 " - a n \H n (r), 

n=l 

where 

Using the fact that Fn ,|g(r) — ^ i^j i | s (r) a.s., we can further deduce that 

oo 

\\Ti\Ur < C\\VN(F DgA]s - Fo^W^ J2 K - a n |((l + <5)F D9 „| s (r))" 

n=l 

for any arbitrarily small S > and all iV large enough. The convergence of the latter 
series to can be proven by arguing as for the term T2 and using the fact that 5 > can 
be chosen arbitrarily small. □ 

Despite its obvious theoretical interest, Theorem 4.3 is of limited relevance in practice 
where the conditions of the theorem cannot be verified and r is unknown. Note also that 
the form of the limiting Gaussian process in (4.43) is not easily tractable. In a simulation 
study found in Section 5.2 below, we make qualitative observations on the performance 
of Fo and the use of bootstrap for CPs. The CI's provided below will be based on the 
sup-norm on an interval [0,T], with fixed T. Justification of the bootstrap procedure is 
beyond the scope of this paper. 

5. Simulation study 

Here, we provide a simulation study for the inference of the distributions of the number 
of renewals and interrenewal times. 

5.1. Simulations for number of renewals 

As mentioned in Introduction, the estimator fw in (4.9) is known to perform poorly as 
q gets smaller. Here, we shall re-examine this fact and the performance of the estimator 
fw via the asymptotic variance ES^)^ in (4.16) of the normal limit in (4.14). In this 
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regard, two regimes should be distinguished: 



(5.1) 



• stable ( sup R q , w < oo or sup ES(£) W < oo 

^w£N ' wEN 

• explosive ( sup R q . w = oo or sup ES(£) W = oo 

^toGN wEN 

where Rq }W is defined in (4.11), that is, 

~ / a \ a (l- g )g(-« 
^ = <L L ) ~2s M s )- 

s—w ^ ' 

As will be seen below, the performance of the estimator fw largely depends on which of 
the two regimes is considered. Before providing some simulations, we take a closer look 
at these regimes for the distributions fw considered in Section 3.2.1. 

Geometric distribution 

Suppose that fw follows a geometric distribution with parameter c £ (0, 1), as in Section 
3.2.1. We shall derive lower and upper bounds for R q , w . For the lower bound, observe 
that 

oo 

fwM > E « S ( 1 - lT~'fw(w) = Cl(cqY 
w—s 

for some constant C\ (which depends on c and q) and hence that 

R q , w > d Y, K —^k (ic) s =C2{-) (5.2) 

s—w \ " / 

for some constant C% (which depends on c and q). For the upper bound of R q>w , we need 
a bound on binomial coefficients. We shall use the standard (but rough) bound given by 
(fe) < n k /k\. We shall also use the following auxiliary result, proved in Appendix A. 

Lemma 5.1. Let a € (0, 1). Then, for some constant C > and all s > 1, 

w s a w dw < Cs s a s (l + (lna" 1 )- 8 ), (5.3) 

J w 2s a w dw < Cs 2s a s (l+ Qlna -1 
Using the lemma, with generic constants Ck, 

„ oo 

fw q (s) < -f<f(l q)- S £ wS W 1 - l)) W 



2s\ 

(5.4) 
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<^q s {l-q)- s w*(c(l-q)) w dw 

s - J s 

< ^q s (l q)- s S s (c(l q)) S (l + (ln(c(l - 

^C^ecqYil + Qnicil-q))' 1 )' 8 ), 

where the last inequality follows from Stirling's formula. Hence, by arguing similarly and 
using (5.4), we have 



(w!) 2 V 9 V V 2 (l-g) 2 ec 

(1 - q) 2w ( ecq 



q 2w \ln(c(l - q))-\ 

V ^2 (l-<?) 2 ec 
,I^VYl+6n-^-V lU + riln. 9 



(5.5) 



c(l-q)J \2 (l-q) 2 ec 

l^qhMi-q))- 1 ^ 2 ™ 



2 (l-g) 2 ec 

+ |1 „ ' yn^Hdi -,))-' v" 



c(l-q)J \2 (l-q) 2 ec 
if — <1 



q j q 

where, for the last inequality, we used the fact that all three log terms (before raising to 
the powers —w and —2w) are bigger than 1 under c 3 c/q < 1. 

The bounds (5.2) and (5.5) show that, for the geometric distribution f\y, 

C _o . . 

- < e => stable regime, 
q 

(5-6) 

— > 1 => explosive regime. 

What happens in the range c/q G (c~ 3 , 1] remains an open question. Our experience in 
practice suggests that the critical point for c/q is closer to 1 and that e -3 is a very rough 
bound. 
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Heavy-tailed distribution 

Suppose that fw follows a heavy-tailed distribution with parameter a (Section 3.2.1). 
We will show that in this case, the distribution is always in the explosive regime (and 
this happens for all a > 0, not just a G (1, 2)). Indeed, observe that 

oo 

fw, (s)>C^2 I s (1 - qy-'w-"- 1 > Cq*s- a - 1 

w—s 

and hence 

R q , w > C J2 I'*-- 1 > C" (5.7) 

s—w 

It remains to observe that the lower bound in (5.7) diverges as w — > oo since q G (0, 1). 

Remark 5.1. In the examples above, note that in the explosive regime, Rq iW or ES(^)^, 
diverges at an exponential rate. The term "explosive" was chosen to reflect this fact. Also, 
note that in the explosive regime, in order to have convergence of fw{w) for all w, we 
naturally need to consider weighted spaces, as in Proposition 4.5 above. 

Remark 5.2 (The impact of small q). Note that R q ^ w too as q^O. Since R q , w > 
q~ 2w fw q (w), when w is larger and q is small, R q<w (or Vai(fw(w))) can also be too large 
for practical purposes (e.g., with w = 10 and q = 0.1, R q>w > 10 20 fw q (w)). Moreover, 
as with the geometric distribution above, we may expect that most of distributions of 
interest (with unbounded support) belong to the explosive regime for small enough q. 
These observations reiterate the current understanding that inference of fw q becomes 
impractical for small q. 

In Figures 2 and 3, we illustrate the performance of the estimator fw(w) in the two 
regimes above. Figure 2 is for the geometric distribution with c = 0.25, q = 0.6, N = 500, 
and Figure 3 is for the Pareto distribution with a = 1.5, q = 0.7, N = 1000. The left- 
hand plots in these figures depict the true p.m.f. fw{w), and the 5%, 50% and 95% 
percentiles of the distribution of the estimator fw(w), based on 1000 Monte Carlo (MC) 
replications. The right-hand plots in these figures depict the true asymptotic standard 
deviation {ES{£) 2 W / N) 1 / 2 or sd(f w (w)), and the 5%, 50% and 95% percentiles of the 
distribution of the estimator (ES^^/N) 1 / 2 or sd(f w (w)), again based on 1000 MC 
replications. 

Note that by the definition (4.10) of fw(w), we have fw(w) = for all w larger 
than the size of the largest sampled flow(s). The largest sampled flow(s) in the study 
corresponding to Figure 2 consists of 5 sampled packets. For this reason, the fw(wYs 
are all zero for w > 6 in Figure 2 (left plot) and obviously have zero estimated standard 
deviation for w > 6 in Figure 2 (right plot). Figure 3, on the other hand, does not depict 
zero fw{u>ys because the size of the largest sampled flow(s) is much greater. Indeed, only 
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Figure 2. Performance of the estimator fw(w) for geometric distribution with c = 0.25, q = 0.6, 
N = 500. Left plot: true fw(w) and MC-based 5%, 50%, 95% percentiles of fw(w). Right plot: 
true asymptotic standard deviation sd(fw{w)) and MC-based 5%, 50%, 95% percentiles of 
sd(f w (w)). 



a short range of w = 1, . . . , 11 is considered in Figure 3. As noted in Remark 5.1 above, 
the estimator standard deviation grows exponentially with increasing w. For larger w, 



Estimator 



Estimator St. Deviation 



- True 
MC 5% 
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Figure 3. Performance of the estimator fw{w) for Pareto distribution with a — 1.5, q = 0.7, 
N= 1000. Left plot: true fw{w) and MC-based 5%, 50%, 95% percentiles of fw{w). Right 
plot: true asymptotic standard deviation sd(fw(w)) and MC-based 5%, 50%, 95% percentiles 
of sd(f w (w)). 
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the features of both plots in Figure 3 would be dominated by this exponential blow-up, 
even if there is a drop to zero in fw(w) and its estimated standard deviation after the 
largest sampled flow(s). 

Figure 3 depicts typical estimation in the explosive regime: fw{w) and its estimated 
standard deviation blow up (before dropping to zero). In Figure 2, on the other hand, 
fw(w) and its estimated standard deviation are stable and reach zero. This is the sit- 
uation associated with the stable regime. Thus, even though the parameters c and q in 
Figure 2 do not satisfy the sufficient condition for the stable regime in (5.6), we still ex- 
pect that these parameters actually lead to the stable regime. (Recall that, as indicated 
earlier, the condition in (5.6) is likely to be very strong.) Also, observe that Figure 3 is for 
a sample size of only ./V = 1000, and roughly the first 5 frequencies can be estimated with 
confidence. With larger sample sizes, for example, with N = 10000 and N = 100 000, 
roughly the first 10 and 12 frequencies, respectively, can be estimated with confidence. 

In Figure 4, we give an idea of the appropriateness of confidence intervals (CI's). The 
left- and right-hand plots of Figure 4 correspond to the situations considered in Figures 2 
and 3, respectively. The 95% MC-bascd percentile of fw(w) is depicted as in Figures 2 and 
3. This percentile is, in fact, centered at the true fw{w). It can be thought of as an ideal 
upper 90% CI bound, corresponding to dealing with an exact distribution of fw(w). We 
also depict analogous bounds based on bootstrapping (BT) and the asymptotic normality 
(AN) result (4.14) for fw(w), with either true ("true") or estimated ("est.") asymptotic 
variance. In the cases of BT and AN (est.) bounds, what are depicted are, in fact, the 
median bounds obtained from MC realizations. As with MC, the BT, AN (true) and AN 
(est.) bounds are centered at fw(w). Note from Figure 4 that CI's based on both BT and 
AN (est.) approaches, which are the only practical alternatives, are quite satisfactory. 

5.2. Simulations for interrenewal times 

We present here a simulation study for the estimator Fjj in (4.31) of the distribution 
of interrenewal times. Two cases are considered. In case 1, we take geometric fw with 
c = 0.25 and q = 0.6, as in Section 5.1 (corresponding to the stable regime). In case 2, 
geometric fw is taken with c = 0.7 and q = 0.6 (corresponding to the explosive regime; 
the simulation results are similar if the Pareto distribution is used as in Section 5.1). In 
both cases, D is supposed to be an exponential random variable with parameter 1. The 
respective sample sizes are N = 500 and N = 1000. 

Figure 5 presents simulation results for case 1 in the left-hand plot and case 2 in the 
right-hand plot. We depict the true function Fd and the 5%, 50% and 95% percentiles 
of the distribution of Fr>, based on 1000 MC realizations. The plots also contain the 
medians of 90% bootstrap-based CPs, computed from 1000 MC replications. The latter 
suggests, in particular, that such CPs are quite appropriate. We should also note that the 
estimator Fjj is here based only on the first sampled interrenewal times when s = 2, that 
is, i = 1 and s — 2 in (4.31). In addition, the estimator is computed by truncating the 
infinite sum in its definition (4.31) and by evaluating convolutions numerically through 
discretizing convolution integrals. 
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Figure 4. 95% CI upper bounds for fw(w) — fw{u>), that is, for fw(w) centered at true fw(w), 
for the two situations of Figures 2 and 3. The left plot corresponds to Figure 2 and the right plot 
corresponds to Figure 3. Both plots consist of the 95% MC-based percentile of fw(w) — fw{w) 
(MC), median bound using bootstrap (BT) and bounds based on the asymptotic normality 
(AN) result (4.14) for fw(w) with true ("true") and estimated ("est.") asymptotic variance. 
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Figure 5. Performance of Fd and the bootstrap CI's for the exponential distribution with 
parameter 1. Left plot: N = 500, q = 0.6 and geometric fw with c = 0.25. Right plot: N = 500, 
g = 0.6 and geometric fw with c = 0.7. Plots include true Fd, MC-based 5%, 50% and 95% 
percentiles of Fd , and the medians of 90% bootstrap (BT) CPs. 
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Note, from Figure 5, that estimation is satisfactory, even for case 2 which corresponds 
to the explosive regime. This seems quite surprising and needs further explanation. In 
fact, in the setting of case 2, only the first several values of fw(w) are really relevant 
for the estimator This can be seen from the definition (4.32) of the sequence A s , 
which enters into Fd after reversion. Note, from (4.32), that in calculating A Sjn , the 
first series term fw( n + 1) is weighted by (1 — q) 71 ^ 1 - This additional factor thus helps 
to keep the blow-up of fw under control. Moreover, from the examples above, if the 
growth of fw{w) is thought to be of order q~ w , then the factor (1 — q) w annihilates this 
growth when q > 0.5. In practice, and in case 2, we also observe that, even though fw(w) 
is highly varying for larger w, the sequence A s _ n decays to zero rapidly. Since the first 
few values of fw{w) can be estimated with confidence even in the explosive regime, the 
above explains the satisfactory performance of Fjj in case 2 of Figure 5. On the other 
hand, note also that the i-range is smaller in the right-hand plot of Figure 5 and that the 
variance of the estimator starts to diverge at the boundary T — 5 of the range. In fact, 
this divergence would be more pronounced (and would dominate the plot) if we increased 
the range of t. Thus, the performance of the estimator is satisfactory, but only to some 
time point T. 

In Figure 6, we also illustrate the performance of Fjj when using several other forms 
of conditioning on W q . The two plots in the figure correspond to the respective plots of 
Figure 5. They depict the 5%, 50% and 95% percentiles of Fd when conditioning on W q = 
s with s = 2, s>2 and s = 3. (In the case of conditioning on s > 2, the formulas (3.10) 
and (3.11) are used.) The 5-95% interpercentile range is the smallest when conditioning 
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Figure 6. Performance of the estimator Fn(t) for two cases from Figure 5. Plots include 
MC-based 5%, 50% and 95% percentiles of Fn(t) when conditioning on W q = s with s — 2, 
s>2 and s = 3. 
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on s > 2, although comparable to that when s = 2 is used. An obvious note to add here 
is that different types of conditioning (for the same N) lead to different effective sample 
sizes of the data used in the computation of Fd , with that for s > 2 being the largest and 
that for s = 3 being the smallest. The interpercentile range is much larger for s = 3 than 
for s = 2 in the left-hand plot because the difference between the corresponding effective 
sample sizes is much larger for case 1. 

Appendix A: Proofs 

Proof of Theorem 3.2. For 1 < i\ < ■ ■ ■ < i n < s and t\ > 0, . . . ,t n > 0, we have 
P(D qtil <t 1 ,...,D q>in <t n ,W q = a) 

(A.l) 

OO 

= E fw{w)P{W q =s\W = w)P(D qM <h,..., D q , in <t n \W = w,W q = s). 

w—s 

Let Mj, j = 1, . . . , n, be the number of renewals not sampled between the ijth and (ij + 
l)th sampled renewals plus 1. If W = w and W q = s are fixed and Mi = mi, . . . , Mj-\ = 
rrij-i, then Mj can take the values 1, . . . , w — m\ — ■ ■ ■ — rrij-i — (s — j) and hence 

P{D qM <t x ,...,D q<in <t n \W = w,W q = s) 

w — (s — 1) w — mi— (s — 2) iv — nil tn n -i — {s—n) 

= E E •■• E cw-cw (a.2) 

m i—l m 2 — 1 m n — 1 

x P(Mi = mi, . . . , M„ = m n |W = W g = s). 

We shall next derive an expression for the probability P(M\ = mi, . . . , M n = m n \W = 
w, W q = s) in (A.2). The indices i\, . . . , i n are not necessarily consecutive, but still form 
separate blocks of consecutive indices (e.g., i\ = 1, £ 2 = 2, £3 = 4, £4 = 8, £5 = 9 form three 
separate blocks of indices {i±, 12}, {£3} and {£4, £5}). We need additional notation to keep 
track of these separate blocks and the gaps between them. Thus, let 1 = j± < j'2 < • • ■ < 
jk 5: n denote subindiccs j in ij for which the corresponding ij is the start of a separate 
block (e.g., with £1 = 1, £2 = 2, £3 = 4, £4 = 8, £5 = 9 above, we have ji = 1, j% = 3, j'3 = 4). 
With this notation, note that r u = j u +i — jut 1 < u < fc — 1 and r^ = n — jk + 1 denote the 
sizes of the k separate blocks. Also, let xq = ij x — 1, x u = ij n+1 —ij u —r u — l, 1 < u < k — 1, 
and Xk — s — i JJt — be the numbers of sampled renewals, respectively, before the ij^h 
sampled renewal, between the (ij u +r„)th and £ Ju+1 th sampled renewals, and after the 
(ij k +rfc)th sampled renewal. 

Given W = w and W q = s, the number of possible distinct locations of the sampled 
renewals satisfying M\ = mi , . . . , M„ = m n can be obtained by considering the possible 
initial locations of the sampled renewals ij u , along with the number of possible distinct 
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locations of the sampled renewals before the i^th, between the (ij u + r u )th and ij.^th, 
and after the (ij k + f/t)th sampled renewal, that is, 

w-fhi fh k — xi x k — (fe— 1) /, N w— ro 2 m k -x 2 x k — (k— 1) , ^ 

e M 5: 2 T 

ui— mn_i— fiik— Xk_i— a: fc — 1 , ^ 

tfe-1 - «fc-2 - ™fc-2 - 1 



X-.-X ^ 

h-l=lk-2+m k -2 + X k -i + l 



ife - h-x - mk-i - 1\ (w - l k - ni k 



E 

ifc='fc-l+mfc-l+a:fc-l + l 



Xk-1 I \ x k 



where l u is the location of the ij„th sampled renewal and fh u = X^l^ r " 1 m v is the num- 
ber of renewals between the ij„th and + r u )th sampled renewals plus 1. By making 
the change of variables l' u = l u — 5Z"=i Wh>> u = 2,...,k, the last expression becomes 

w-m-xi x k -(k-l) . , w—m—X2 x k — (fc— 2) . , , 

E V») £ ( 2 t 

^-m-ajfe-i-Xjt— 1 / , , 



X---X ^ 

x E 

i 'fc='fc-i+ a: '«-i+ i 



fc-1 l fc-2 



l 'k- l 'k-l- 1 \( W - m - l 'k\ = ( W ~ m 

Xk-i J\ x k J \x -\ Vx k + k 



where the last identity can be viewed as a generalization of the identity (3.8) used in the 
proof of Theorem 3.1. Since xq + ■ ■ ■ + x k + k = s — n, we deduce that 

P{Ah =mi, . . .,M n = m n \W = w, W q = s) = ( W ~ m ) I ( W ) . (A.3) 



s — n I \s 



By using (A.1)-(A.3), we have that 

P{D q , il <t l ,...,D q , in <t n ,W q = s) 



Fr> q , n \ s (t) 



fw q 0) 



1 OO 

- ( 5 — 1 ) Ul—1 

e - e c:;;') <*•«> 



u—(s—l) w — mi m n -i — (s — n) 

' w — m 

X 



mi — 1 m n — l 
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oo oo 



m i — l m n — l 



where B s m is given by (3.13). The relation (3.12) is obtained by taking the LT in (A. 4). □ 
Proof of Proposition 3.1. For s > 2 and t > 0, we have 

oo 

P(V q <t,W g = s) = fw(w)P(W q =s\W = w)P{V q <t\W = w,W g = s). (A.5) 

Now, letting M be the number of original renewals between the first and last sampled 
renewals plus 1, we have 

w— 1 

P(V g <t\W = w,W q = s) = J2 Fn m (t)P{M = m\ W = w, W s = s) 

(A.6) 

£*s m (*)(* 



m— 1 

10— 1 

m — 1 \ i ( w 

[w — m) ' 

m— 1 



s-2 I Is 



where (ui — m) is the number of possible distinct locations of the first sampled renewal 
and (™~2 1 ) 1S the number of possible locations of sampled renewals between the first and 
last sampled renewals when M = m. By using (A.5) and (A.6), we deduce that 

- oo oo 

F v,|2+(*) = p(w >2) E P (^ ^ *> W * = s ) = E C m FTV), (A.7) 

^ 9 — ' s=2 m=l 

which can also be written as in (3.14). □ 
Proof of Proposition 4.1. The equality in (4.2) follows from (3.1) and 

£(;)^£(:)«-a-.™.) 

= £w.)('-.rE(;)C) 

w—n s—n K / v / 

= £ - *r- Q e (") - e /^Nd - «)- b Q 2—. 

w—n ^ ' s—n ^ ' w—n ^ ' 
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Then, by the assumption (4.2), we can substitute (3.1) into the right-hand side of (4.1) 
and use Fubini's theorem to obtain 

wl \s 



n—w s—w \ * \ * 



n—w 

= E M»)d - iT- w [ I) E (7)H) fe = fw(w), 

n=w \ ' k=0 ^ ' 

where we have used the identity J2k=a (*D M) fe = if .fT > 1, and = 1 if = 0. □ 

Proof of Proposition 4.2. We consider only the case I = 2 (and z 2 = 0). Note that 

i 
it 



^(x)„=E(;j^ (1) Wi(-^r" 

i—n ^ * 



fe — z 



fc=n y i=n \ / \ / 

£ 3 (!)e (*::)<-*>'-"<- 



k—n 



k—i 



The change of the order of summation above can be justified by using Fubini's theorem 
and the assumption (4.5). □ 

Proof of Proposition 4.3. The proof is similar to that of Propositions 4.1 and 4.2, 
and only the case I = 2 (and z 2 = 0) will be considered. By using (3.1), we observe that 

^,>.-£(:) ' n " (i ; , " , - E 

n trj N / L. , \ / 
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=EM*)i;f*)(^(* l -(i-«))'- n (i-9)* 



\k-i 



(A.8) 



= £>(*/' 

where the order of the summation above is changed using Fubini's theorem. Indeed, the 
application of Fubini's theorem is possible as long as \zi — (l — q)\ + (l — q) < 1 or Z\ G Co, 
which is one of the conditions on z\. By using (A.8), we further get that 

GO / \ OO / \ 

T^iM,)n=E u)(-*)^£ (M-'fwik) 

i=n ^ ' k=i ^ ' 

OO k 



k—n 



where the use of Fubini's theorem is justified by the fact that \—z\\ + \zi\ < 1 or Z2 = G 
Ci, which is one of the conditions on 22. □ 

Proof of Proposition 4.4. Write TW^ 1 )) = TWf^f 3 ) + T' 1 )^ 1 '), where 



0, if.- ?2 ' s I d X) , i£s>j 



and observe that, for fixed j, T^(£i) and T^(^^) are independent. We can then also 
write 

TWfcW^-rW^^ (A.9) 

where T^ 2 )^ 1 -*)^ and T^ 2 -*^^):/',™ are independent for fixed j. By Proposition 4.2, we 
have that 

s—w ^ ' 

By using the fact that the two terms in (A.9) are Gaussian and independent, we can then 
write 

EeWrWfcW),,. = e -(i/2)e=<„ e - (1/2)^4^ (A 1Q) 
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where 



•> if 



rl <w = E{T^{^). w f. 



In view of (A. 10), for T^\^ l ')j jW to converge in distribution, it is necessary that the 
series <j\ w converges, that is, that (4.13) holds. (In other words, if (4.13) does not hold, 
then the limit of (A. 10) is zero.) □ 



Proof of Proposition 4.5. The proof is in two steps: (a) yN(fw q ~ fw q ) ~ >d £ ha the 
space l o,q- 1 (b'+i-q)i where £ is the process appearing in the proof of Theorem 4.1; (b) 
the mapping S in (4.4) is continuous from l o,q- 1 {b'+i-q) to /oo,&- 

For (a), it is enough to show the tightness of vN(f\Va — fw q ) hi l o,q- 1 (b'+i-q)- By the 
results of [23], page 229 (see also [4], Theorem 2.5), this follows from the fact that for 
any e > 0, 

lim supP(sup( (Z - 1 (6' + l- g )) s |v^V(/^( S )-/w,(s))|>e) =0. (A.ll) 
For the latter, observe that 
E p(sup( y + ^ ~ q y\VN(f Wq (s) - f Wg (s))\ > e 

^ E \ ) E\VN(f Wq ( S )~f Wq (s))\ 

< E (^)V £ i^(mw-mw)i 2 < E C-^-X M*) 1/2 

s=m+l V " / s=m+l \ " / 

and that (A.ll) follows from the assumption (4.20). 
For (b), observe that 



\\S(x) - S(y)||oo, 6 = supb n \S(x) n - S(y) n \ < V b n \S(x) n - S(y) n 

< f>" E (!) t^i^ - wi =E \xi - vi\ 



n>0 

i\ , v^, Jb+l-q 

i Fi ~ Vi \ = 2^ 

n— i— n v ' 2—0 

< sup \x n -y n \y, \ 7771 

= \\x - l/Hoo,g-'(y+i- g ) E( y 1 1 - o ) " D 
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Proof of Proposition 4.6. With X N = VN(A S - A s ) and Y N = VN(F Dq i]s - F Dq<ila ), 
it is enough to show that (a) the finite-dimensional distributions of (X N ,Y N ) converge 
to those of {(,Z); (b) the sequence (X N ,Y N ) is tight. The latter condition follows 
from the following two conditions: (bl) the sequence X N = {X^} n ^ is tight; (b2) 
the sequence Y N is tight. The tightness of Yn in (b2) is a standard result, part of 
the empirical central limit theorem (see, e.g., [20], Section V.2) and will not be proven 
here. By the results of [23], the condition (bl) can be replaced by (bl) for any e > 0, 
lim m ^oo sup^ P(sup„ >m zg\X% | > e) = 0. 

For the latter (bl), it is enough to show the same relation, but where X N is replaced 
by X N , which is defined without q s /fw q (s) and q s /fw q (s) m A s and A s , respectively. 
With the sequence X N , observe that 



Vi>m 



eP(wpzS\X?\>e) < J2 z oE\Xn\ 

n— m+1 

oo oo / \ 

< E E %E\VN(f w iw)-f w (w))\r2")(l-qY 



n— m+1 w—n-\-s—l 

oo w—s-{-l / 

= E\VN(fw(w)-fw(w))\(l-qr- s £ (" 

w—m+s n— m+1 ^ 

oo 

< E\VN(fw(w)-f w (w))\(l-qr- s z^- s+1 ( W ~ m 



w—m+s 



S 



where we have used the fact that z > 1 and YZZ^+i (^-D = ("'7") ■ Sincc E\y/N(fw(w) 
fw(w))\ 2 < Rq.w, as in the proof of Theorem 4.1, we further get that 



eP(BiipzS\X»\> e )< £ s]E\VN(f w (w)-f w (w))\ 2 {l-qy 



w — m 



w— m+s x 

w — m 
s 



w—m-\-s 



The condition (bl) now follows from the assumption (4.36). 

For the convergence of the finite-dimensional distributions in (a), we show only the 
convergence of X^ (the general case can be considered along similar lines). For j > 
n + s — 1 , define 

X^ n = VN(Ai n ~Ai n ) 7 

where 



/w, («) w= „ 



•u,, ^— Y fw(w)[_ i )(i- q y 



s-l 



w — n 
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and similarly with A\ , and also 



j-j V A s,n q s v> „ u . fw-n\ 

Ci = —f — tt + 7 — rr s (0w[ 1 (l-g) 



It is enough to show that: 

(i) Xf„4a asTV^w; 

(ii) Q^Cn as j^oo; 

(hi) for any <5 > 0, limsup^^ limsup^^^ P{\X^ n — X%\ > 5) = 0. 

The convergence in (i) follows directly from Theorem 4.1. The convergence in (ii) follows 
using -E|S'(£) l „| < y/ES^) 2 , < y/R q , w and the assumption (4.36). Condition (hi) can be 
proven as in part (bl) above using the assumption (4.36). □ 

Proof of Lemma 5.1. Using integration by parts, we obtain 

°° s s a s s r°° 

w s a w dw = + — / w^a™ dw 

in a 1 In a 1 J s 

s s a s s s a s s 2 f°° 

T + 71 — ^U2 + n — =n5 / w'-Vdiu 
lna 1 (In a (ma J s 

- 1 s s f°° 

< s s a s > - t-t: + 7- tt- / a w dw 

f^ i {lna- 1 ) k (lna- 1 ) 5 J s 



1 s s a s (lna- 1 )- 8 - 1 - 1 



^(lna-^fe lna- 1 (lna- 1 )- 1 -! 



from which the bound (5.3) follows. The inequality (5.4) can be proven similarly. □ 



Appendix B: Bounds on remainder terms in Taylor 
expansions of compositions of power 
series 

Here, we prove a result which was used several times in the proof of Theorem 4.3. 

Proposition B.l. For any formal power series x(z) = Y]°°_ 1 x n z n , y{z) = Y^Li DnZ n , 
£ i z ) = S^Li £ nZ n and n > 1, we have 

\(xo(y + e)-xo y)J < ((zW o (y + + £+ )) * £+ )„, (B.l) 
\(x o (y + e) - x o y - (x^ o y )*e)J< ^{{x^ o (y+ + s+)) * e+ * £+)„. (B.2) 
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Proof. For N > 1, consider new formal power series xn{z) = X^=i x « z "' Vn{z) = 

Hn=i VnZ n and e N (z) = 5Z„=i ^z"- Since the power series x N o (y N + e N ) - x N o y N 

and x o (y + e) — x o y, and (a^V (?/•#,+ + £ JV,+)) * £ n,+ and (je+^ o (y + + * e + have 
the same first N elements, and since N is arbitrary, it is enough to prove (B.l) for any 

xn, yN and ejy. Letting f m (ux, . . . , ujv) = £W»Q^k=i u kZ k ) m be a real-valued function on 
IR^, and with -D/ m denoting the gradient of f m , observe that 

N / I N \ m /N 

(xn o (yN +£n) - x N o y N )(z) = ^2 x m I I ^2(Vk +£k)z k J -\Z2vkZ k \ J 

m=l \ \fc=l / \fc=l / / 

N 



ZZUmiVl + £l, ■ ■ ■ ,VN + £n) ~ fm(Vl, ■ ■ ■ ,Vn)) 
<n=l 

(B.3) 

N 

Cm, 1? * • * 3 ^m,iv) 



m— 1 



JV / N \ ™- 1 iV 

= ^2 x m m ^c m ^z fe y^£fc2 fc , 

m=l \k=l / fc=l 

where A T is the transpose of the matrix A and (c m> \, . . . , c m .jv) = (1 — t m )(yi, . . . ,yj\r) + 
t m {yi + £i, ■ ■ ■ 7 Un + £n) for some t m € [0, 1]. On the other hand, observe that 

N / N \ m ~ 1 N 

((^. ) + o( yAr , + + £w , + ))* £Ar , + )(z)=^x+m ^(y++ £ +)z fe J2 e t z "- ( B - 4 ) 

m=l \fe=l / fc=l 

Since |c m ^| < y£ + , it is clear that the absolute value of the nth element of (B.3) is 
less than or equal to the nth element of (B.4). This completes the proof of (B.l). 
The proof of (B.2) is similar, but involves the observation that 

{xn o (y N + e N ) - x N o y N - (x^ o y N ) * £ N )(z) 
N 

= ^2 (/m(yi + El, • • • ,VN + £n) ~ fm(Vl, ■ ■ • , Dn) ~ [El • • • £iv] T -D/m(2/l, • ■ • , 2/iv)) 
m— 1 

l w 

= ^ [ £l ' " ^r^/mtrfmj, ■ • ■ , d„»,jv)[ei • • • Ejy] 

m— 1 

TV / JV \ " l ~ 2 / JV \ 2 

= -^x m m(m-l) ^d m ,fcz fc X^*^ ' 



2 

m— 1 1 / \k— 1 



where -D 2 / denotes the Hessian of / m and (d TO) i, . . . , d m> jf) = (1 — t m )(yx, ■ ■ ■ ,Vn) + 
*m(yi +£i,- • - ,2/jv +£n) with i m e [0,1]. □ 
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